Bilingual Connections for Trilingual Corpora: An XML Approach
نویسندگان
چکیده
This paper describes the design and development of a trilingual spontaneous speech corpus for statistical speech-to-speech translation. The languages considered are Catalan, Spanish and US-English. This corpus has been built bearing in mind the strong need for multilingual collections of on-line data within the area of statistical translation, as well as the need for data that are parallel or aligned, that contain different types of linguistic information and that can be used by different translation systems. For that reason, our aim has been the creation of a linguistically-enriched resource with an XML-based DTD that allows a useful, transparent and flexible storage of the data. Moreover, these resources are also valuable for a wide range of Natural Language Processing applications, such as multilingual resource acquisition or word sense discrimination, among others.
منابع مشابه
Looking for Transliterations in a Trilingual English, French and Japanese Specialised Comparable Corpus
Transliterations and cognates have been shown to be useful in the case of bilingual extraction from parallel corpora. Observation of transliterations in a trilingual English, French and Japanese specialised comparable corpus reveals evidences that they are likely to be used with comparable corpora too, since they are an important and relevant part of the common vocabulary, but they also yield l...
متن کاملUsing Uplug and SiteSeeker to construct a cross language search engine for Scandinavian languages
This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora. We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing six different languages. In order to compare how well different types of bilingual dictionaries covered the most common ...
متن کاملAcquisition of Medical Terminology for Ukrainian from Parallel Corpora and Wikipedia
The increasing availability of parallel bilingual corpora and of automatic methods and tools for their processing makes it possible to build linguistic and terminological resources for low-resourced languages. We propose to exploit various corpora available in several languages in order to build bilingual and trilingual terminologies. Typically, terminology information extracted in French and E...
متن کاملTowards a Description of Trilingual Competence
Most studies involving trilingualism have been carried out within the theoretical framework of bilingualism research. No attempt has been made to delimit trilingualism as a concept in its own right, and often it has been assumed to be an extension of bilingualism. In young children, trilingual language acquisition largely follows the path of bilingual acquisition. With regard to language behavi...
متن کاملSpeech perception in noise by monolingual, bilingual and trilingual listeners.
BACKGROUND There is strong evidence that bilinguals have a deficit in speech perception for their second language compared with monolingual speakers under unfavourable listening conditions (e.g., noise or reverberation), despite performing similarly to monolingual speakers under quiet conditions. This deficit persists for speakers highly proficient in their second language and is greater in tho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004